Introduction

This final project aims to highlight some of the techniques acquired during the SDS course, in particular regarding the construction and simulation of non-linear regression and classification models using tools such as Rjags and MCMC. In this regard we will consider two models for nonlinear regression and two models for binary classification of the state of stars: primordial or advanced.
The dataset is taken from Kaggle here and contains data according to 6 different types of stars classified accordingly to some features like:

The Hertzsprung–Russell diagram

HR Diagram

The Hertzsprung–Russell diagram, is a scatter plot of stars showing the relationship between the stars’ absolute magnitudes or luminosities versus their stellar classifications or effective temperatures. It turned out to be of considerable importance in the understanding of the analyzed data as it allowed a fast and usable visualization of the classes of stars and their reference parameters. In fact, after a quick analysis I made a plot of the correlation to have an immediate figure of the correlation and the relationship between the input variables. From this type of visualization, non numerical variables, have been excluded.

Correlation Plot

ggcorrplot(corr, method = "circle")
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

As we can see in the correlation plot between the numerical input variables, temperature, brightness, and radius are positively correlated, as stars in these six classes are celestial bodies that steadily increase in size even in their final life stage. Except for absolute magnitude which is negatively correlated with the other variables. In fact, in the final phase of life of a star, although it has a significantly greater radius than all other life phases, its brightness is almost zero. Otherwise, in the case of the radius we can see a weak correlation with the Temperature, since high mass stars tend not to have high temperatures in the last two life phases.

Features Distribution

grid.arrange(plt.dens.temp,plt.dens.lum,plt.dens.radius,plt.dens.magn,ncol=2,nrow=2)

Abs.Magnitude vs Temperature Stars Clustering

Below I report a plots that define the clusters of the various stars as a function of absolute magnitude.

plt1.6class

Main Sequence Data Restriction:

As shown in the clustering plot the main sequence is the star’s lifetime that i have restricted my analysis, in facts, the cores of main-sequence stars are in hydrostatic equilibrium, where outward thermal pressure from the hot core is balanced by the inward pressure of gravitational collapse from the overlying layers. The strong dependence of the rate of energy generation on temperature and pressure helps to sustain this balance. Thanks to this characteristic balance, we can observe how the trend of the variables generates a functional trend similar to a decreasing function that from now on i will try to investigate and find two models that allow me to fit the data and predict new observations.

grid.arrange(plt.dens.temp.res,plt.dens.lum.res,plt.dens.radius.res,plt.dens.magn.res,ncol=2,nrow=2)

Model I

In the first model, the decreasing trend of the data is described as nonlinear model distributed as a normal with mean value a descending exponential function. A function always positive as the exponential was not enough because the magnitude also reaches negative values, so I introduced a another (root) function. In fact, the combination of the two functions allows to observe the following trend: in the early stages is the exponential function to have the predominance while later, with the cancellation of the effect of the exponential comes out the root function that completes the desired trend.

We assume that each observation Y_i is drawn from a Normal distribution with mean \(\mu_i\) and precision \(\tau = \frac{1}{\sigma^2}\)

Likelihood function: \[\begin{equation*} Y_i \sim \mathcal{N(\mu,\tau^2)} \end{equation*}\]

\[\begin{equation*} \mu_i = f(x_i)= \alpha(x_i - k)^{\beta} -\gamma \exp^{\delta(x_i -k)} \end{equation*}\] i= 1,…,40

\[\begin{align*} L(\alpha,\beta,\gamma,\delta,k,\tau|y,x) = \prod_{i=1}^n \frac{1}{\Big(2\pi \tau^2\Big)^{\frac{1}{2}}} exp\bigg\{-\frac{(y_i-\mu_i)^2}{2\tau^2} \bigg\} = \end{align*}\]

\[\begin{align*} = \frac{1}{\Big(2\pi \tau^2\Big)^{\frac{n}{2}}} exp\bigg\{-\frac{\sum_{i=1}^{n}(y_i-\mu_i)^2}{2\tau^2} \bigg\}= \end{align*}\]

\[\begin{align*} = \frac{1}{\Big(2\pi \tau^2\Big)^{\frac{n}{2}}} exp\bigg\{-\frac{\sum_{i=1}^{n}\bigg(y_i-\big[\alpha(x_i - k)^{\beta} -\gamma \exp{\delta(x_i -k)^2}\big]\bigg)^2}{2\tau^2} \bigg\} \end{align*}\]

A choise for the prior of the precision \(\tau\) is the Gamma distribution with parameter \(\Gamma(a,b)\) with \(a,b \in \mathbb{R}_0^+\)

so the prior can be written as:

\[\begin{align*} \pi = \frac{b^a}{\Gamma(a)} \tau^{a-1}e^{-b\tau} \end{align*}\]

model
    {
        for( i in 1:N ) {
            Y[i] ~ dnorm(mu[i], tau)
            mu[i] <-  alpha*pow((x[i]-k),beta) + gamma*exp(delta*(x[i]-k))
        }
        alpha   ~ dnorm(1, 1.0E-1)
        beta    ~ dbeta(1.0, 1.0)
        gamma   ~ dnorm(0, 1.0E-1)
        delta   ~ dnorm(0, 1.0E-1)
        tau     ~ dgamma(0.001, 0.001)
        k       ~ dunif(0,min(x))

        sigma <- 1 / sqrt(tau)

    }

Syntetic Data Generation

Having defined the model, I created some synthetic data to study how well the model is able to retrieve the parameters and then you can see how by a good margin it succeeds.

################ SYNTETIC DATA GENERATION FOR EXPONENTIAL MODEL

sintetic_data_exp <- function(alpha,beta,gamma,delta,k,n_sample,noise_factor,left_bound,right_bound){
  
  tau <- rep(0, n_sample)
  x <- rep(0, n_sample)
  shift <- rep(0, n_sample)
  mu <- rep(0, n_sample)
  y <- rep(0, n_sample)
  y_noise <- rep(0, n_sample)
  
  for (i in 1:n_sample){
    x[i]  <- runif(1,min=left_bound, max=right_bound)
    tau[i] <- rgamma(1,0.001, 0.001)
    shift <- x-k
    mu[i] <-  alpha*((x[i]-k)^beta) + gamma*exp(delta*(x[i]-k))
    y[i]  <- rnorm(1,mu[i], tau[i])
    
  }
  y_noise <- jitter(y, factor=noise_factor, amount = NULL)
  return(as.data.frame(list("Temperature"=sort(x),"Magnitude"=sort(y),"Magnitude_w_noise"=sort(y_noise))))
}

sn_data_exp = sintetic_data_exp(alpha=2,beta=.5,gamma=2,delta=-50,k=.5,n_sample=400,noise_factor=0,left_bound=0.5,right_bound=4.6)

Testing Model I on Synthetic Data

We now turn to the actual data and see how the model works. The evaluation and comparison of the two models will be done using DIC.
The model with the lower DIC indicate a better model-data fit.

syn_data_j_exp <- list("Y"=sn_data_exp$Magnitude,"x"=sn_data_exp$Temperature,"N"=40)
inits = list(list(alpha = 1.2, beta = 0.2,  gamma = 0.2,delta=-50, tau = 1),
             list(alpha = 1.2, beta = 0.2,  gamma = 0.2,delta=-50, tau = 1))
params = c("alpha","beta","gamma","delta","k","tau")
model_exp_f=jags(data=syn_data_j_exp, inits=inits,
               parameters.to.save=params,
               model.file="models/exp_model.txt",
               n.chains=2,
               n.burnin=1000,
               n.iter=10000,
               n.thin = 10)
## module glm loaded
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 40
##    Unobserved stochastic nodes: 6
##    Total graph size: 1098
## 
## Initializing model
model_exp_f
## Inference for Bugs model at "models/exp_model.txt", fit using jags,
##  2 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
##  n.sims = 1800 iterations saved
##          mu.vect sd.vect     2.5%      25%     50%     75%   97.5%  Rhat n.eff
## alpha      3.354   1.848    1.421    1.758   2.780   4.681   7.495 1.032    53
## beta       0.196   0.210    0.009    0.018   0.036   0.360   0.693 1.037    48
## delta    -35.610  41.674  -98.741  -83.984  -0.776  -0.386  -0.205 1.006   250
## gamma     -3.555   1.883   -7.797   -4.623  -3.454  -2.420  -0.265 1.001  1800
## k          0.481   0.046    0.308    0.479   0.501   0.501   0.501 1.337    17
## tau      165.236 139.584   19.738   34.595 154.672 278.295 443.913 1.055    34
## deviance -71.348  43.051 -124.315 -112.835 -96.397 -29.168 -13.969 1.049    38
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 901.9 and DIC = 830.6
## DIC is an estimate of expected predictive error (lower deviance is better).
chainArray.exp.f <- model_exp_f$BUGSoutput$sims.array

Testing Model I on Real Data

real_data_exp <- list("Y" = df_stars_exp$Absolute.magnitude.Mv.,"x" = log(df_stars_exp$Temperature..K.,base=10), "N" =length(df_stars$Temperature..K.))
inits = list(list(alpha = 2, beta = .2,gamma = 0.2,delta=-1.5, tau = .1),
             list(alpha = 2.5, beta = .2,gamma = 0.2,delta=-1.5, tau = .1))
params = c("alpha","beta","gamma","delta","k")

model_exp=jags(data=real_data_exp, inits=inits,
                parameters.to.save=params,
                model.file="models/exp_model.txt",
                n.chains=2,
                n.burnin=1000,
                n.iter=10000,
                n.thin = 10)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 40
##    Unobserved stochastic nodes: 6
##    Total graph size: 378
## 
## Initializing model
model_exp
## Inference for Bugs model at "models/exp_model.txt", fit using jags,
##  2 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
##  n.sims = 1800 iterations saved
##          mu.vect sd.vect   2.5%    25%    50%    75%   97.5%  Rhat n.eff
## alpha     -5.940   0.800 -7.803 -6.259 -5.788 -5.436  -4.910 1.081    59
## beta       0.752   0.164  0.400  0.644  0.775  0.889   0.987 1.009   170
## delta     -2.242   0.534 -3.264 -2.619 -2.256 -1.878  -1.215 1.027    84
## gamma      8.680   1.232  6.922  7.816  8.458  9.293  11.658 1.002  1100
## k          3.569   0.036  3.477  3.553  3.578  3.595   3.609 1.004   420
## deviance  93.936   3.213 89.899 91.472 93.236 95.602 101.782 1.018   110
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 5.1 and DIC = 99.1
## DIC is an estimate of expected predictive error (lower deviance is better).
chainArray.exp <- model_exp$BUGSoutput$sims.array
coda.fit.exp <- coda::as.mcmc(model_exp)

Density Plot Model I

Let’s have a look on the parameter distributions

bayesplot::mcmc_combo(coda.fit.exp)

Trace Plot Model I

At first we can evaluate and check the convergence of the Markov chain throught the traceplot: particularly, we can observe that the chains converge around the mean (next plot) value of the parameters.

traceplot(model_exp)

Running Mean Plot Model I

The trend seen from the trace plot is even more evident if we consider the convergence of the running mean.

grid.arrange(plt.alpha,plt.beta,plt.gamma,plt.delta,plt.k)
## Warning: Removed 50 row(s) containing missing values (geom_path).
## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 row(s) containing missing values (geom_path).

Autocorrelation Plot Model I

An important aspect to take into account in these problems is the autocorrelation between the points, in fact ideally we expect the autocorrelation to go to zero as this implies that the new points generated are independent and uncorrelated. In fact for this reason it has been set the value n.thin qual to 10, meaning that we are taking a point after every ten to avoid a possible correlation.

coda.fit <- coda::as.mcmc(model_exp)
coda::acfplot(coda.fit.exp)

Model Quantitative Diagnostic Model I

Raftery & Lewis Test for Model I

Raftery and Lewis’s diagnostic is a run length control diagnostic based on a criterion of accuracy of estimation of the quantile q. The algorithm looks for the smallest thinning interval k that makes Zt (binarization of the chain θ w.r.t. .θ) behaves as if it came from an independent 2-state Markov chain.

coda::raftery.diag(coda.fit)
## [[1]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
## 
## You need a sample size of at least 3746 with these values of q, r and s
## 
## [[2]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
## 
## You need a sample size of at least 3746 with these values of q, r and s

The algorithm returns the number of sample size N required to estimate the quantile q within an accuracy r with probability s.

Geweke Test for Model I

Geweke is a convergence diagnostic test for Markov chains. This diagnostic is based on a test for equality of the means of the first and last part of a Markov chain (by default the first 10% and the last 50%). If the samples are drawn from a stationary distribution of the chain, then the two means are equal and Geweke’s statistic has an asymptotically standard normal distribution. The test statistic is a standard Z-score: the difference between the two sample means divided by its estimated standard error. The standard error is estimated from the spectral density at zero, and so takes into account any autocorrelation. The Z-score is calculated under the assumption that the two parts of the chain are asymptotically independent. As those plots show, the Z-score values are mostly in the acceptance region

coda::geweke.diag(coda.fit)
## [[1]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##    alpha     beta    delta deviance    gamma        k 
##  -1.3868  -0.8031   1.1550  -0.2298  -2.1974   1.6494 
## 
## 
## [[2]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##    alpha     beta    delta deviance    gamma        k 
##  0.29983 -0.32439  0.01066 -1.25797 -1.78932  1.63398
coda::geweke.plot(coda.fit)

Gelman Test For for Model I

The potential scale reduction factor is calculated for each variable in x, with upper and lower confidence limits. Approximate convergence is diagnosed when the upper limit is close to 1.

coda::gelman.diag(coda.fit)
## Potential scale reduction factors:
## 
##          Point est. Upper C.I.
## alpha          1.16       1.54
## beta           1.02       1.08
## delta          1.09       1.32
## deviance       1.04       1.18
## gamma          1.01       1.05
## k              1.01       1.03
## 
## Multivariate psrf
## 
## 1.07

Heidel Test for Model I

The Heidelberger and Welch diagnostic assesses the convergence by testing the null hypothesis (p-value 0.05) that the samples of the Markov chain are drawn from a stationary distribution. The “failure” of the stationarity test highlights that a longer MCMC run is needed. The half-width test calculates a 95% confidence interval for the mean, using the half of the width of this interval and then compares this value with the estimation of the mean.

coda::heidel.diag(coda.fit)
## [[1]]
##                                        
##          Stationarity start     p-value
##          test         iteration        
## alpha    passed       1         0.433  
## beta     passed       1         0.477  
## delta    passed       1         0.244  
## deviance passed       1         0.968  
## gamma    passed       1         0.436  
## k        passed       1         0.513  
##                                    
##          Halfwidth Mean   Halfwidth
##          test                      
## alpha    passed    -5.835 0.1070   
## beta     passed     0.766 0.0238   
## delta    passed    -2.301 0.0832   
## deviance passed    93.626 0.2153   
## gamma    passed     8.718 0.2170   
## k        passed     3.567 0.0060   
## 
## [[2]]
##                                        
##          Stationarity start     p-value
##          test         iteration        
## alpha    passed       1         0.492  
## beta     passed       1         0.287  
## delta    passed       1         0.368  
## deviance passed       1         0.117  
## gamma    passed       1         0.688  
## k        passed       1         0.970  
##                                    
##          Halfwidth Mean   Halfwidth
##          test                      
## alpha    passed    -6.045 0.20256  
## beta     passed     0.739 0.02143  
## delta    passed    -2.184 0.09854  
## deviance passed    94.245 0.33520  
## gamma    passed     8.643 0.16863  
## k        passed     3.571 0.00511

Credible Interval

chainArray <- model_exp$BUGSoutput$sims.matrix
colMeans(chainArray)
##      alpha       beta      delta   deviance      gamma          k 
## -5.9398618  0.7523921 -2.2423108 93.9359244  8.6804553  3.5689866

Approximation Error

approx.error.exp.chain1
##         alpha1        beta1      gamma1       delta1           k1
## 1 0.0004519604 2.914493e-05 0.001739852 0.0002725208 1.444814e-06
approx.error.exp.chain2
##         alpha2        beta2      gamma2       delta2           k2
## 1 0.0009458643 3.036976e-05 0.001634321 0.0003550699 1.362142e-06

Point Estimate

cred <- 0.95
p.ET.jags <- apply(chainArray, 2, quantile, 
                    prob=c((1-cred)/2, 1-(1-cred)/2))
p.ET.jags
##           alpha      beta     delta deviance     gamma        k
## 2.5%  -7.803246 0.4004705 -3.263854  89.8989  6.922236 3.476738
## 97.5% -4.909834 0.9866157 -1.214734 101.7822 11.658286 3.608697

HPD

p.HPD.jags <- coda::HPDinterval(as.mcmc(chainArray))
p.HPD.jags
##              lower       upper
## alpha    -7.458420  -4.7885662
## beta      0.449416   0.9990595
## delta    -3.248838  -1.2072610
## deviance 89.667320 100.7858486
## gamma     6.733745  11.2139157
## k         3.494732   3.6102937
## attr(,"Probability")
## [1] 0.95

Model II

model
    {
        for( i in 1:N ) {
            Y[i] ~ dnorm(mu[i], tau)
            mu[i] <-  alpha + beta*x[i] + gamma*(x[i]^2)
}
        alpha  ~ dnorm(0.0, 1.0E-12)
        beta   ~ dnorm(0.0, 1.0E-12)
        gamma  ~ dnorm(0.0, 1.0E-12)
        tau ~ dgamma(0.001, 0.001)
        sigma <- 1 / sqrt(tau)
}

Testing Model II on Synthetic Data

################################ FAKE DATA GENERATION 
sintetic_data <- function(alpha,beta,gamma,n_sample,noise_factor,left_bound,right_bound){
  tau <- rgamma(n_sample,0.001, 0.001)
  x   <- rep(0, n_sample)
  mu  <- rep(0, n_sample)
  y   <- rep(0, n_sample)
  y_noise <- rep(0, n_sample)
  
  for (i in 1:n_sample){
    x[i]  <- runif(1,min=left_bound, max=right_bound)
    mu[i] <-  alpha + beta*(x[i]) + gamma*((x[i])^2)
    y[i]  <- rnorm(1,mu[i], tau)
  }
  y_noise <- jitter(y, factor=noise_factor, amount = NULL)
  return(as.data.frame(list("Temperature"=x,"Magnitude"=y,"Magnitude_w_noise"=y_noise)))
}

sn_data = sintetic_data(alpha=-1,beta=-.8,gamma=.8,n_sample=40,noise_factor=800,left_bound=-2,right_bound=0.5)
################################ TESTING ON FAKE DATA AND RETRIVING PARAMETERS 

syn_data_j <- list("Y"=sn_data$Magnitude_w_noise,"x"=sn_data$Temperature,"N"=40)

inits = list(list(alpha = 1.2, beta = 0.2,  gamma = 0.2, tau = 1),
             list(alpha = 1.2, beta = 0.2,  gamma = 0.2, tau = 1))

params = c("alpha","beta","gamma","tau")
model_quad=jags(data=syn_data_j, inits=inits,
                parameters.to.save=params,
                model.file="models/quadratic_model.txt",
                n.chains=2,
                n.burnin=1000,
                n.iter=10000,
                n.thin = 10)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 40
##    Unobserved stochastic nodes: 4
##    Total graph size: 252
## 
## Initializing model
model_quad
## Inference for Bugs model at "models/quadratic_model.txt", fit using jags,
##  2 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
##  n.sims = 1800 iterations saved
##          mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
## alpha     -1.018   0.026  -1.072  -1.035  -1.019  -1.001  -0.966 1.001  1800
## beta      -0.853   0.061  -0.973  -0.894  -0.852  -0.811  -0.735 1.001  1800
## gamma      0.778   0.035   0.712   0.756   0.777   0.802   0.845 1.001  1800
## tau      113.673  27.674  66.546  94.315 111.427 130.226 174.514 1.002  1800
## deviance -74.905   3.424 -78.674 -77.118 -75.528 -73.413 -67.939 1.037   560
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 5.9 and DIC = -69.1
## DIC is an estimate of expected predictive error (lower deviance is better).

Testing Model II on Real Data

real_data_quad <- list("Y" = df_stars_quad$Absolute.magnitude.Mv.,"x" = log(df_stars_quad$Temperature..K.,base=10), "N" =length(df_stars$Temperature..K.))
inits = list(list(alpha = 1.2, beta = 1.2,  gamma = 0.2, tau = 1),
             list(alpha = 1.2, beta = 1.2,  gamma = 0.2, tau = 1))

params = c("alpha", "beta", "gamma", "tau")

model_quad=jags(data=real_data_quad, inits=inits,
                parameters.to.save=params,
                model.file="models/quadratic_model.txt",
                n.chains=2,
                n.burnin=1000,
                n.iter=10000,
                n.thin = 10)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 40
##    Unobserved stochastic nodes: 4
##    Total graph size: 252
## 
## Initializing model
model_quad
## Inference for Bugs model at "models/quadratic_model.txt", fit using jags,
##  2 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
##  n.sims = 1800 iterations saved
##          mu.vect sd.vect     2.5%      25%     50%     75%   97.5%  Rhat n.eff
## alpha    218.652  26.122  165.694  201.607 218.960 236.229 269.680 1.001  1800
## beta     -95.257  12.732 -119.898 -103.803 -95.466 -86.918 -69.358 1.001  1500
## gamma     10.157   1.545    7.007    9.149  10.181  11.191  13.133 1.001  1700
## tau        1.870   0.428    1.106    1.567   1.838   2.135   2.789 1.002  1500
## deviance  89.474   2.886   85.934   87.421  88.815  90.868  96.505 1.001  1800
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 4.2 and DIC = 93.6
## DIC is an estimate of expected predictive error (lower deviance is better).
chainArray.quad <- model_quad$BUGSoutput$sims.array
coda.fit.quad <- coda::as.mcmc(model_quad)

Density Plot Model II

bayesplot::mcmc_combo(coda.fit.quad)

Trace Plot Model I I

traceplot(model_quad)

Running Mean Plot Model II

grid.arrange(plt.alpha,plt.beta,plt.gamma)

Autocorrelation Plot Model II

coda::acfplot(coda.fit.quad)

Model II Quantitative Diagnostic

Raftery & Lewis Test for Model II

coda::raftery.diag(coda.fit.quad)
## [[1]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
## 
## You need a sample size of at least 3746 with these values of q, r and s
## 
## [[2]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
## 
## You need a sample size of at least 3746 with these values of q, r and s

Geweke Test for Model II

coda::geweke.diag(coda.fit.quad)
## [[1]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##    alpha     beta deviance    gamma      tau 
## -0.07315  0.05418 -1.23611 -0.04208  0.68600 
## 
## 
## [[2]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##    alpha     beta deviance    gamma      tau 
##  -0.5765   0.5954  -1.1034  -0.6107  -1.1262

Gelman Test for Model II

coda::gelman.diag(coda.fit.quad)
## Potential scale reduction factors:
## 
##          Point est. Upper C.I.
## alpha             1       1.00
## beta              1       1.00
## deviance          1       1.00
## gamma             1       1.00
## tau               1       1.02
## 
## Multivariate psrf
## 
## 1.01

Heidel Test for Model II

coda::heidel.diag(coda.fit.quad)
## [[1]]
##                                        
##          Stationarity start     p-value
##          test         iteration        
## alpha    passed       1         0.890  
## beta     passed       1         0.960  
## deviance passed       1         0.213  
## gamma    passed       1         0.967  
## tau      passed       1         0.367  
##                                    
##          Halfwidth Mean   Halfwidth
##          test                      
## alpha    passed    219.31 1.6205   
## beta     passed    -95.59 0.7888   
## deviance passed     89.48 0.1941   
## gamma    passed     10.20 0.0957   
## tau      passed      1.86 0.0282   
## 
## [[2]]
##                                        
##          Stationarity start     p-value
##          test         iteration        
## alpha    passed       1         0.840  
## beta     passed       1         0.823  
## deviance passed       1         0.626  
## gamma    passed       1         0.806  
## tau      passed       1         0.673  
##                                    
##          Halfwidth Mean   Halfwidth
##          test                      
## alpha    passed    217.99 1.6864   
## beta     passed    -94.93 0.8219   
## deviance passed     89.47 0.1830   
## gamma    passed     10.12 0.0997   
## tau      passed      1.88 0.0303

Credible Interval Model II

chainArray.quad <- model_quad$BUGSoutput$sims.matrix
colMeans(chainArray.quad)
##      alpha       beta   deviance      gamma        tau 
## 218.651588 -95.256855  89.474228  10.157337   1.870342

Point Estimate Model II

cred <- 0.95
p.ET.jags <- apply(chainArray.quad, 2, quantile, 
                    prob=c((1-cred)/2, 1-(1-cred)/2))
p.ET.jags
##          alpha       beta deviance     gamma      tau
## 2.5%  165.6943 -119.89850 85.93388  7.007062 1.106489
## 97.5% 269.6802  -69.35847 96.50507 13.133205 2.789119

HPD Model II

p.HPD.jags <- coda::HPDinterval(as.mcmc(chainArray.quad))
p.HPD.jags
##                lower      upper
## alpha     169.435812 272.344182
## beta     -121.337211 -70.981509
## deviance   85.592845  94.828928
## gamma       7.271441  13.389959
## tau         1.072382   2.718793
## attr(,"Probability")
## [1] 0.95

Which regression model is best?

data.frame("Model" = c("Exponential", "Quadratic"),
           "DIC" = c(model_exp$BUGSoutput$DIC,model_quad$BUGSoutput$DIC))
##         Model      DIC
## 1 Exponential 99.05368
## 2   Quadratic 93.64199

Models Comparison In Predictions

Finally, we can compare how the two models work for predictions. Considering some missing points, we can predict using the posterior predictive distribution, in the following way:

\[\begin{align*} f(y^{pred}|y) = \int_\Theta f(y^{pred}|\theta,y,x) \pi(\theta) d\theta \end{align*}\] where \(\Theta\) is the parameter space, \(y^{pred}\) are the predictions and \(x_i\) are the input features. From the two plots we can observe that the quadratic model works better (as the DIC had said), in fact the predictions agree better with the quadratic model than with the mixed exponential model fit.

grid.arrange(plt.c_e,plt.c_q,nrow=1)
## Warning: Removed 11 rows containing missing values (geom_point).
## Warning: Removed 29 rows containing missing values (geom_point).
## Warning: Removed 11 rows containing missing values (geom_point).
## Warning: Removed 29 rows containing missing values (geom_point).

Frequentist Approach vs Bayesian Approach Comparison

model_quad
## Inference for Bugs model at "models/quadratic_model.txt", fit using jags,
##  2 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
##  n.sims = 1800 iterations saved
##          mu.vect sd.vect     2.5%      25%     50%     75%   97.5%  Rhat n.eff
## alpha    218.652  26.122  165.694  201.607 218.960 236.229 269.680 1.001  1800
## beta     -95.257  12.732 -119.898 -103.803 -95.466 -86.918 -69.358 1.001  1500
## gamma     10.157   1.545    7.007    9.149  10.181  11.191  13.133 1.001  1700
## tau        1.870   0.428    1.106    1.567   1.838   2.135   2.789 1.002  1500
## deviance  89.474   2.886   85.934   87.421  88.815  90.868  96.505 1.001  1800
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 4.2 and DIC = 93.6
## DIC is an estimate of expected predictive error (lower deviance is better).
x.1 <- log(df_stars_quad$Temperature..K.,base=10)
x.2 <- x.1^2
quadModel <- lm(Absolute.magnitude.Mv.~ x.1+x.2 , data=df_stars_quad)
summary(quadModel)
## 
## Call:
## lm(formula = Absolute.magnitude.Mv. ~ x.1 + x.2, data = df_stars_quad)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.35813 -0.25115 -0.02477  0.60648  1.24166 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  219.855     26.291   8.362 4.73e-10 ***
## x.1          -95.850     12.816  -7.479 6.58e-09 ***
## x.2           10.230      1.555   6.578 1.04e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7317 on 37 degrees of freedom
## Multiple R-squared:  0.9611, Adjusted R-squared:  0.959 
## F-statistic: 457.1 on 2 and 37 DF,  p-value: < 2.2e-16

Classification: The Logit-Binomial Model

\[\begin{align*} Y_i \sim Bern(logit(p_i)) \end{align*}\]

\[\begin{align*} logit(p_i) = log(\frac{p_i}{1-p_i}) = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_3 + \beta_4x_4 \end{align*}\]

model{
  # Likelihood
  for (i in 1:N){
    y[i] ~ dbern(p[i])
    logit(p[i]) <- (beta0+beta1*(x1[i])+beta2*(x2[i])+beta3*(x3[i])+beta4*(x4[i]))
  }

beta0 ~ dunif(min(x1),max(x1))
beta1 ~ dunif(min(x1),max(x1))
beta2 ~ dunif(min(x2),max(x2))
beta3 ~ dunif(min(x3),max(x3))
beta4 ~ dnorm(0.0, 1.0E+2)
}

Heidel Test Model III

coda::heidel.diag(coda.fit.logit)
## [[1]]
##                                        
##          Stationarity start     p-value
##          test         iteration        
## beta0    passed       1         0.2868 
## beta1    passed       1         0.4661 
## beta2    passed       1         0.0884 
## beta3    passed       1         0.3318 
## beta4    passed       1         0.2176 
## deviance passed       1         0.8075 
##                                    
##          Halfwidth Mean   Halfwidth
##          test                      
## beta0    passed     3.658 0.01461  
## beta1    passed     3.493 0.00475  
## beta2    passed     5.770 0.00885  
## beta3    passed     1.451 0.03348  
## beta4    passed    -0.433 0.00411  
## deviance passed    13.434 0.30534  
## 
## [[2]]
##                                        
##          Stationarity start     p-value
##          test         iteration        
## beta0    passed         1       0.8930 
## beta1    passed         1       0.0839 
## beta2    passed        91       0.2638 
## beta3    passed         1       0.1630 
## beta4    passed       181       0.2046 
## deviance passed         1       0.4342 
##                                    
##          Halfwidth Mean   Halfwidth
##          test                      
## beta0    passed     3.666 0.01452  
## beta1    passed     3.492 0.00454  
## beta2    passed     5.765 0.00989  
## beta3    passed     1.480 0.03297  
## beta4    passed    -0.435 0.00462  
## deviance passed    13.452 0.30641

Classification: The Probit-Binomial Model

model{
  # Likelihood
  for (i in 1:N){
    y[i] ~ dbern(p[i])
    probit(p[i]) <- (beta0+(beta1*x1[i]*x2[i])+(beta2*x3[i]*x4[i]))
  }

beta0 ~ dnorm(0.0, 10)
beta1 ~ dnorm(0.0, 100)
beta2 ~ dnorm(0.0, 10)

Heidel Test Model IV

coda::heidel.diag(coda.probit.fit)
## [[1]]
##                                        
##          Stationarity start     p-value
##          test         iteration        
## beta0    passed       1         0.179  
## beta1    passed       1         0.462  
## beta2    passed       1         0.480  
## deviance passed       1         0.473  
##                                   
##          Halfwidth Mean  Halfwidth
##          test                     
## beta0    passed    0.408 0.02079  
## beta1    passed    0.230 0.00587  
## beta2    passed    0.167 0.01049  
## deviance passed    2.048 0.14314  
## 
## [[2]]
##                                        
##          Stationarity start     p-value
##          test         iteration        
## beta0    passed       1         0.238  
## beta1    passed       1         0.682  
## beta2    passed       1         0.891  
## deviance passed       1         0.322  
##                                   
##          Halfwidth Mean  Halfwidth
##          test                     
## beta0    passed    0.418 0.02287  
## beta1    passed    0.234 0.00564  
## beta2    passed    0.169 0.01027  
## deviance passed    1.962 0.15352
#Predictions
beta0 = 0.676
beta1 = 0.242
beta2 = 0.198

df_stars_test_life <- stars.df.sorted[stars.df.sorted$Star.type == '0',]

label="avanzato"



accuracy<- function(dataset,star.type,label) {
  df_stars_test_life <- dataset[dataset$Star.type == star.type,]
  
x1 = log(df_stars_test_life$Temperature..K.,base=10)
x2 = log(df_stars_test_life$Luminosity.L.Lo.,base = 10)
x3 = log(df_stars_test_life$Radius.R.Ro.,base=10)
x4 = df_stars_test_life$Absolute.magnitude.Mv.
#######################
o <- beta0+beta1*(x1*x2)+beta2*(x3*x4)
o_perc <- 1/(1+exp(-o))
#######################
pred = c()
count_0 = 0
count_1 = 0 
for (i in 1:length(o_perc)){
if (o_perc[i]<0.5) {
pred[i] = 0
count_0 = count_0+1
} 
else {
pred[i]=1
count_1 = count_1+1
  }}

if (label=="avanzato") { 
  acc = count_1/40
} else if (label == "primordiale" ) {
  acc = count_0/40
}
return(acc)
} 

primordiale.df <- data.frame(
"label.type" = c(0,1,2,3,4,5),
"acc.type"  = c(1,1,1,0.05,0,0.55)
)

avanzato.df <- data.frame(
  "label.type" = c(0,1,2,3,4,5),
  "acc.type"   = c(0,0,0,0.95,1,0.45)
)

acc.plt1 <- ggplot(primordiale.df, aes(x=label.type, y=acc.type, group=4))+ geom_line(linetype = "dashed")+geom_point(color="red")+
   ggtitle("Classification Accuracy to Primordial Stars State") +
  xlab("Star Type") + ylab("Accuracy")

acc.plt2 <- ggplot(avanzato.df, aes(x=label.type, y=acc.type, group=4))+ geom_line(linetype = "dashed")+geom_point(color="red")+
  ggtitle("Classification Accuracy to Advanced Stars State") +
  xlab("Star Type") + ylab("Accuracy")
  
accuracy(dataset=stars.df.sorted,star.type=c("5"),label="avanzato")
## [1] 0.45
grid.arrange(acc.plt1,acc.plt2, ncol=2)